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Summary 

A 5'-terminal leader sequence of 35 nucleotides was 
found to be present on multiple trypanosome RNAs. 
Based on its representation in cONA libraries, we 
estimate that many, if not all, trypanosome mRNAs 
contain this leader. This same leader was originally 
identified on mRNAs encoding the molecules re- 
sponsible for antigenic variation, variant surface gly- 
coproteins. Studies of selected cDNAs containing 
this leader sequence reveaM that leader-contain- 
ing transcripts can be stage-specific, stage-regu- 
lated, or constitutive. They can be abundant or rare, 
and transcribed from single or multigene families. 
No linkage between the genomic leader sequences 
and the structural gene exons was observed. Pos- 
sible mechanisms by which the leader sequences 
are added to trypanosome mRNAs are discussed. 

Introduction 

The surface of pathogenic African trypanosomes is cov- 
ered with a densely packed coat composed of a single 
protein species, the variant surface glycoprotein (VSG). 
The VSG gene repertoire of the trypanosome contains 
some 300 to 1000 genes (Van der Ploeg et al., 1982a), 
yet only one VSG gene at a time is expressed. Switching 
transcription from one VSG gene to another leads to 
antigenic variation, allowing the parasite to evade the 
immune response of its mammalian host. The result is the 
relapsing parasitemia of African sleeping sickness. Some 
of the molecular and biological aspects of antigenic varia- 
tion have been recently reviewed (Parsons et al., 1984a). 

The molecular mechanism that assures the transcrip- 
tional activation of a single VSG gene remains unknown. 
Comparisons of the genomic contexts of VSG genes have 
revealed that certain VSG genes undergo duplication when 
activated (Hoeijmakers et al., 1980a; Pays et al., 1981a; 
Longacre et al., 1983; Parsons et al., 1983). The new 
expression-linked copy is located at a different site in the 
genome and is transcribed (Pays et al., 1981b; Bernards 
et al., 1 981). Other VSG genes appear to be activated in 
situ, as they undergo no detectable alteration in genomic 
organization when expressed (Young et al., 1982). Tran- 
scriptionally active VSG genes invariably reside in regions 
of the genome relatively devoid of restriction enzyme 
recognition sites (the "barren" regions) (Van der Ploeg et 
al., 1982b) and close to what appears to be a telomere 
(OeLange and Borst, 1982; Williams et al., 1982). These 
data suggest that some aspects of the genomic location 
of VSG genes may be important for their activation. How- 



ever, analyses of restriction enzyme sites located 5' to the 
barren regions indicates that there are several sites in the 
genome from which VSG genes can be transcribed 
(Longacre et al., 1983; Myler et al., submitted; Allison et 
a!., submitted). Furthermore, while a VSG gene must reside 
in one of these sites in order to be transcribed, the opposite 
is not true; occupation of a particular site does not guar- 
antee transcription (Buck et al., 1984; Allison et al. f sub- 
mitted). 

Whether transcribed from duplication- or nondupiication- 
activated genes or from genes residing in similar or distinct 
genomic locations, all VSG mRNAs share the same 35 
nucleotide sequence at their 5' terminus (Van der Hoeg 
et al., 1982c; Boothroyd and Cross, 1982). This untrans- 
lated sequence is not encoded by DNA contiguous to the 
structural gene, and therefore has been termed the spliced 
leader (SL). Sequences encoding the SL are repeated 
100-200 times in the genome of Trypanosoma brucei, and 
each resides in a 1.4 kb unit monomer (OeLange et al., 
1983; Nelson et al., 1983). The vast majority of these unite 
are directly and tandemly repeated to form a large array(s). 
However, a few 1.4 kb units, with their resident SL se- 
quences, are dispersed from the tandem array (Nelson et 
al.. 1983; Parsons et al., 1984b) and are termed orphons 
according to the nomenclature of Childs et al. (1981). 
Surprisingly, neither the large array nor the orphons are 
detectabty linked (i.e., within 50 kb) to active VSG genes. 
Nevertheless, the tandem array of SL reiteration units has 
been proposed to mark the 5' boundary of the VSG 
expression site (DeLange et al., 1983). Immediately 5' to 
the SL in the 1 .4 kb repeat unit are sequences resembling 
eucaryotic RNA polymerase II promoters (DeLange et al., 
1983); according to the multiple promoter hypothesis 
above, these function in the initiation of VSG gene tran- 
scription. Only that VSG gene placed downstream from 
the array through DNA rearrangement would be tran- 
scribed, thus only a single VSG gene at a time would be 
expressed. 

Certain clues, however, indicate that the function of the 
SL repeat is more complex. First, the SL is transcribed not 
only by the mammalian bloodstream stage of the parasite, 
which expresses VSG and undergoes antigenic variation, 
but also by procydic culture forms (analogous to the insect 
midgut stage of the trypanosome life cycle), which do not 
express VSG (DeLange et al., 1983; Parsons et al., 1984b). 
Second, sequences homologous to the SL are highly 
reiterated in the genomes of certain trypanosomatids that 
dp not undergo antigenic variation, such as Trypanosoma 
cruzi, the intracellular parasite that causes Chagas disease, 
and Leptomonas colfosoma, a parasite of insects (Nelson 
et al.. 1984). These findings suggest that although the SL 
is used by African trypanosomes for VSG expression, it is 
an ancient sequence that may 'jlfill other more fundamen- 
tal functions in gene expression in these organisms. This 
would predict the presence of the SL on other trypano- 
some transcripts as well as on VSG mRNA. T test this 
hypothesis we have screened cDNA libraries made from 
T. brucei bloodstream or procydic RNA for clones hybrid- 
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izing to a synthetic probe complementary to 22 nucleotides 
of the 35 nucleotide SL. Recombinant clones which did 
not encode VSG, but which contained the SL, were de- 
tected in both libraries. The expression of the correspond- 
ing genes is regulated differently from VSG genes; they 
are transcribed, during both bloodstream and procyclic 
stages. As with VSG genes however, the SL is not en- 
coded contiguously to these various structural genes, and 
is apparently derived from a transcript originating else- 
where in the genome. Our data suggest that rather than 
providing for transcriptional exclusivity in VSG gene expres- 
sion, the SL plays a universal role in trypanosome gene 
expression. 

Results 

Isolation of cDN A Clones Containing the SL 

Only bloodstream-stage trypanosomes synthesize VSG 
mRNA (Agabian et a!., 1380; Overath et a!.. 1983; Parsons 
et ah, 1984b), and then, of the thousandrodd VSG genes, 
each variant antigen type (VAT) transcribes only the VSG 
gene that is ultimately expressed as its surface coat 
(Hoeijmakers et al., 1980b; Pays et al., 1980). Thus, in 
Figure 1A, when total RNA is fractionated by agarose gel 
electrophoresis, transferred to nitrocellulose membranes, 
and hybridized to a ^-labeled probe (Northern analysis), 
a prominent species is detected by a cDNA encoding VSG 
3 in RNA isolated from bloodstream-stage cells of VAT 3 
(lane B). This RNA, of about 1.9 kb, and a putative 
precursor of about 4 kb (seen only upon overexposure of 
the autoradiogram), are not observed in procyclic RNA 
(lane P) nor in bloodstream-stage RNA isolated from other 
VATs (not shown). As previously reported (Parsons et al., 
1984b), when the same RNAs are hybridized to a probe 
complementary to 22 nucleotides of the 35 nucleotide SL 
(see Figure 2) a very different pattern is obtained (Figure 
1B). Here, a smear of RNAs ranging in size from 0.5 to 
over 6 kb is detected in both procyclic (lane P) and 
bloodstream stage (lane B). In particular, a 1 .9 kb species 
is seen in VAT 3 RNA (lane B). Ribosomal RNAs (their 
position indicated by the hash marks) do not hybridize to 
the synthetic probe. The SL-containing RNAs are not 
simply transcripts of the genomic 1.4 kb repeat unit in 
which the SL resides, as rehybridization of the same blot 
to a clone containing the 1.4 kb genomic repeat reveals 
no hybridizing RNA (not shown). Since many of the RNAs 
revealed by the SL probe were smaller than the mature, 
VSG mRNA, it seemed unlikely that they represented VSG 
mRNA processing intermediates. We therefore hypothe- 
sized that the smear of RNAs hybridizing to the synthetic 
probe represents transcripts derived from other genes that 
also use the SL or an SL-like sequence. 

As a first step in testing this hypothesis, we used the 
SL probe to screen two cDNA libraries, one made from 
RNA isolated from VAT 5 and one made from RNA isolated 
from procyclic cells derived from VAT 5. In the procyclic 
library of 1000 clones, 24 clones were found that hybrid- 
ized with the synthetic probe. Of approximately 600 recom- 
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Figure 1 . Northern Analyses 

Total bloodstream VAT 3 RNA (B) and procyclic RNA (P) were fractionated 
on adjacent lanes (in quadruplicate) of a single gel. Four blots were prepared 
and hybridized to the probes listed below. When blots were re-used, we 
removed the previous probe with the appropriate denaturation conditions, 
verified by autoradiography. The probes used were as follows: (A) VSG 3 
cDNA; (B) 22-mer SL; (C) pSLd cDNA; (D) pSLc2 cDNA; (E) pSLc4 cDNA; 
(F) pSLC3 cDNA. 

binant clones in the VAT 5 library, 25 VSG-encoding 
cDNAs were detected, but only one of these contained 
5'-terminal sequences as defined by hybridization to the 
SL probe. Seventeen other clones also hybridized with the 
SL probe. By analogy with VSG transcripts, the SL should 
be located at the 5' terminus of transcripts giving rise to 
these clones. If only 5% of our cDNAs contain 5'-terminal 
sequences (as is the case for the VSG clones), then the 
numbers cited above indicate that approximately half of 
the clones in the library are derived from transcripts con- 
taining the SL. Thus, as predicted from the Northern 
analyses, SL-containing transcripts are abundant in stages 
of the life cycle when VSG is (bloodstream) or is not 
(procyclic) expressed. Most of the SL-containing cDNAs 
were represented only once in the libraries, implying that 
the complexity of such transcripts is high. 

Nucleotide Sequence Analysis of SL cDNAs 

To determine whether the hybridizing sequences con- 
tained in these cDNAs were identical with the SL found on 
VSG mRNAs (Figure 2) or were divergent SL-like se- 
quences, and to search for possible protein-coding se- 
quences, the hybridizing regions of four cDNAs (two from 
the bloodstream-stage library, and two from the procyclic 
library) were cloned into M13 bacteriophage and their 
nucleotide sequences were determined. As shown in Fig- 
ure 2, each clone contained a complete or partial SL. In 
the case of pSLd and pSLc3 t the SL was truncated; the 
cDNA clone contained 23 and 26 bp, respectively, of the 
35 nucleotide SL sequence. We interpret these to be 
incomplete copies of their corresponding transcripts. Since 
the synthetic probe is complementary to the 3' portion of 
the SL, and since the cDNA and probe sequences match 
perfectly, these clones were easily detected by the SL 
probe. Clone pSLc4 contained a complete SL and, 5' to 
that, 3 nucleotides that correspond to those immediately 
5' to the SL in the genomic repeat unit. These data suggest 
that sequences 5' to the SL may be transcribed, as has 
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SL 

22-raer 
pSLcl 



AACGCTATTA TTAGAACAGT TTCTGTACTA TATTG 
T AATCTTGTCA AAGACATGAT A 



AGAACAGT TTCTGTACTA TATTGT ATCA AATAATAAGA GAATTAACTT 

TGTAGATAAA GAAAGCAATA AAGCATCA ATG AGC GGA AAG GAA GTT GGA GGT 
net ser gly lys glu val gly gly 



Figure 2. Nucleotide Sequence Analysis 
The sequence of the 35 nucleotide SL and the 
sequence of the 22 nucleotide probe complemen- 
tary to the SL ere shown at the top. Below are 
shown partial sequences of several cONAs that 
hybridize to the SL probe (the SL portion is under- 
lined). Putative reading frames are translated into 
amino acid sequences. The asterisk (*) indicates 
that another 92 bp of sequence was found 5' to 
the SL (see text). 



pSLc2 



•AACGCTATTA TTAGAACAGT TTCTGTACTA TATTGG TTCG CTTTAACTTG CCAGTACGCT 
TGTGAAGCGG TT 



pSLc3 CA TTAGAACAGT TTCTGTACTA TATTGA CTAC CTTCTCGTTA GTGTAACAAG 

TCCTCTGCA6 TG ATG ATG TTC GGT CGC CCC GCC GTC CCC CAG GCA ACC TGG 
net met phe gly arg pro ala val pro gin ala thr trp 

GAA GAG AAG TAT TTT TAT CAA AAA CTT CAC CAT CTT TTC GAC CAT GCT 
glu glu lys tyr phe tyr gin lys leti Ms Ms leu phe asp his ala 

GCT GAT TGG TTC GTA ACG AAG GTT AAC TGG TGG ATG CCG TCT ATC GGT 
ala asp trp phe val thr lys val asn trp trp net pro ser lie gly 

GCC GGG ATG GTG CTC AGT CTC T 
ala gly net val leu ser leu 



pSLcfl ACT AACGCTATTA TTAGAACAGT TTCTGTACTA TATTGT GCCA CTAGCGAAGG GGGCGAAGGA 

GACCGAAGAG 6AGAGGGGTT AATAATTTGT GTAACTATTA CCTGTA ATG TTG CGT CTC 

met leu arg leu 

TGC CGT GTG TCA CTG CGT GTC CAG TCA CAC CAG AAG AAG CGC GCA CAG CAC 
cys arg val ser leu arg val gin ser his gin lys lys arg ala gin his 

CCC AAC GCC GGC ACA CGG TTT GGA CGT GTG TAC AAT CGC GGT TTC ATT CGG 
pro asn ala gly thr arg phe gly arg val tyr asn arg gly phe lie arg 

TAC GGC TTC GGT GGT TTC GGC AT 
tyr gly phe gly gly phe gly 



been proposed by Boothroyd and Cross (1982). The fourth 
clone, pSLc2, contained a complete and perfect SL An 
additional 93 bp (indicated by an asterisk in Figure 2) are 
found 5' to the SL in this clone. Subsequent hybridization 
analysis suggested that this 93 bp sequence is derived 
from sequences within the structural gene, implying that 
its appearance 5' to the SL resulted from an artifact of 
cDNA cloning. Michiets et al. (1983) have described a VSQ 
cDNA containing sequences 5' to the SL. 

The start codons of VSG mRNAs are located some 30 
to 80 nucleotides downstream from the SL We further 
analyzed these clones for the presence of a start codon 
and the existence of a possible reading frame. Our results 
indicate that, like the VSG leaders, these leaders are 
composite, containing the 5' SL followed by additional 
noncoding sequences. For example, the first ATG triplet is 
encountered 53 bp downstream from the SL in pSLd . In 
pSLc3, a start codon is found 37 bp downstream from the 
SL, followed by a reading frame extending at least 159 
nucleotides. pSLc4 contains a reading fram that starts 72 1 
nucleotides downstream from the SL, extending at least 
135 nucleotides. This region encodes a 45 amino acid 
polypeptide rich in basic amino adds. Thus there is no 
reason to suspect that these cDNAs are not derived from 



bona fide mRNAs since they contain protein-coding read- 
ing frames. 

Expression of Genes that Use the SL 

The one well-studied class of genes that employs the SL. 
VSG genes, shows stage-specific expression, tt was there- 
fore interesting to determine whether stage-specific 
expression was a general feature of genes that use the 
SL The transcripts that correspond to several of the SL- 
containing cDNAs were examined by Northern analysis of 
RNA isolated from the two easily obtained stages of the 
trypanosome life cycle: bloodstream and procyclic stages. 
The RNA blots shown in Figures 1A and 1B were rehy- 
bridized with SL cDNA probes corresponding to single- 
copy genes (see below). In each case one major RNA 
species was detected, ranging in size from 1 .3 to 2.3 kb, 
depending on the cDNA used as probe (Figure 1, OF). 
The two clones from the procyclic library, pSLc3 (Figure 
1F) and pSLc4 (Figure 1E), a!*o detect additional higher 
molecular weight RNAs in both bloodstream and procyclic 
samples. For example, pSLc4 hybridizes primarily to a 1 .4 
kb RNA, but also detects 1.75 and 2.0 kb RNAs. (These 
are more clearly visible on longer autoradiographic expo- 
sures.) Unlike VSG genes, all of these genes are tran- 
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Figure 3. Genomic Southerns 

^NA isolated from trypanosomes of VAT 3 (38), VAT 5 (SB), or procydic 
cells derived from VAT 5 (5P) was cleaved with Eco RV. Two micrograms 
of each sample was etectrophoresed in triplicate on an agarose gel and 
transferred to nftroceHutose paper and hybricfeed with nick-translated SL 
cDNAs. The hash marks on the teft indicate the position of DNA markers 
at 23 kb. 9.6 kb, 6.6 kb. 4.3 kb, and 2.3 kb. Probes were: A, VSG 5 DNA; 
B.pSLc2;C.pSLc5. 

scribed during both bloodstream and procydic phases of 
the trypanosome life cycle (compare lanes P and B). One 
of the cDNAs isolated from the bloodstream-stage library, 
pSLsl. detects a transcript that appears more abundant 
in the bloodstream than in procydic stage (Figure 1C), 
while pSLc3, isolated from the procydic library, detects a 
transcript that is more abundant in procydic RNA (Figure 
IF). Since the SL cDNA probes had specific activities 
similar to that of the VSG cDNA probe, but required 
autoradiographic exposures some five to ten times longer, 
it seems that the abundance of each of these transcripts 
is quite low compared to that of VSG gene transcripts 
(Rgure 1A). 

Genomic Organization of Genes That Use the SL 

Genomic Southern analysis using SL cDNA probes reveals 
several classes of genes whose transcripts contain the 
SL. These experiments were performed under conditions 
too stringent for stable hybridization of the 35 nucleotide 
SL with its corresponding genomic sequences. Figure 3 
shows Eco RV-deaved genomic DNA from bloodstream- 
stage VATs 3 (3B) and 5 (5B) and from the procydic 
population obtained by in vitro cfifferentiation of VAT 5 
(5P). When hybridized with a VSG 5 cDNA probe, the 
hallmark properties of many VSG genes are visible (Figure 
3A): the presence of a multigene family, an expression- 
linked copy (the extra gene copy in VAT 5 as compared 
to VAT 3). and the retention of the expression-linked copy 
by procydic cells (Parsons et a)., 1984b). In addition, the 
expression-linked copy, as well as one other member of 
the VSG 5 gene family, resides on restriction fragments 
that show variations in size. Such variation is characteristic 
of VSG genes that are located adjacent to a telomere and 
is correlated with cell growth or division (Bernards et al., 
1983). 



In contrast, in this and many other restriction digests, 
pSLc2 hybridizes to a single genomic sequence (Figure 
3B), indicating that pSLc2 detects a single-copy gene. 
These data also suggest that there are no large introns 
within the portion of the gene detected by the cDNA probe 
(an intron between the SL and the structural gene would 
not be detected, since the SL sequence does not hybridize 
to its corresponding genomic sequences under these 
conditions). Unlike VSG genes, this gene shows no varia- 
tion between VATs or life-cycle stages. Of six SL-contain- 
ing cDNA clones used as probes in Southern analyses, 
four appeared to detect single-copy, nonvarying genes. 
The remaining two cDNAs detected nonvarying muttigent 
families such as that shown in Figure 3C. Since all trypan- 
osome telomeric sequences thus far studied show varia- 
tion in length over time, it appears that, unlike many VSG 
genes, these other genes that employ the SL do not reside 
adjacent to a telomere. Again, unlike active VSG genes, 
the other genes that employ the SL are not situated in 
DNA devoid of restriction sites (see also Figure 5). None 
of the SL cDNAs detected sequences that comigrate with 
any genomic 1.4 kb SL repeat units, suggesting they are 
not closely linked to the tandem array or the major SL 
orphons (not shown). 

Cloning of Genes Encoding SL cDNAs vfl 

Although no evidence of dose linkage between genomic 
SL sequences and the genes detected by SL cDNAs was 
observed in Southern analyses, it might be difficult to 
detect a single SL dispersed from the tandem array under 
the conditions employed. We therefore doned genomic 
sequences corresponding to four of the SL-containing 
cDNAs, previously determined to detect single-copy' 
genes. VAT 5 DNA was partially digested with Bam Ml or 
Mbo I, fragments ranging from 12 to 20 kb were cloned 
into bacteriophage A 1059, and recombinants that hybrid- 
ized with the SL cDNA probes were isolated. None of 
these dones hybridized with more than one cDNA, indi- 
cating that the genes examined are not closely finked. 
None of the genomic clones hybridized with the SL, or 
with the genomic 1 .4 kb SL repeat unit even under relaxed 
conditions(3x SSC, 50°C). For example, in Figured, cDNA 
clone pSLd (lane O), and genomic DNA done X(SL)g1 
(lane G) were digested with Cla I and hybridized either to 
a 5' 250 bp subdone of pSLd (Figure 4A) or to the SL 
prooe (Figure 4B). Both the cDNA and the genomic dones 
hybridize with the 5' 250 bp fragment, but only the cDNA 
hybridizes with the SL. 

The absence of the SL in these recombinants could be 
the result of doning only the 3' portion of the gene. To 
rule out this possibility, representative genomic clones of 
each of the four genes were analyzed by restriction en- 
zyme mapping. In each case, uones were found that 
contained several (3-12) kilobases of DNA upstream from 
the structural gene (Rgure 5). The genomic done dis- 
cussed above and shown in Figure 4 contained 8 kb 5' 
to the structural gene. The restriction sites present in each 
cDNA, up to but not including the Xmn I site contained in 
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Figure 4. Hybridization of the SL Probe to Genomic and cDNA Clones 
Genomic clones detected by pSLrt were isolated. Genomic done X(SL)g1 - 
3 (lane G) and cDNA clone pSLcl (lane C) were cleaved with Cla I. Oa I 
has recognition sites within both of the vectors, asweUas within the cloned 
structural gene (see Figure SA). (A) Hybridization with a 250 bp 5' fragment 
of pSLcl . (B) Hybridization with the SL probe. 
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Figure 5. The Genomic Environrnerrt of Selected Genes That Use the SL 
In each case, the top line shows the restriction map derived from overiap- 
ping genomic clones, while the lower lire shows tternap of tte 
rng cONAs. In soma cases only the restriction sites closest to the gene are 
shown. Note that the Xmn i site (X) present in the SL was not observed in 
the corresponding position in the genomic clones. Restriction enzyme sites 
are marked as follows: Bgl I, Z; Bg! Q, G; Sam HI, B; Cta I. C; Eco Rl, E; 
Hmc D, H; Hind ffl, D; Kpn I. K; Mtu I, M; Nco I, N; Pst I, P; Pvu A, V; Sal I, L; 
Sma I, S; Sph I. Y; Sst I. T; Xho 1, 0; Xmn l t X. 

(A) pSLd and its genomic environment; (B) pSLc2 and its genomic 
environment; (C) pSLc3 and its genomic environment; (D) pSLc4 and its 
genomic environment. 



the SL (see Figures 2 and 5). were also present in the 
homologous genomic clones. For example, the Hinc II site, 
located 37 bp from the 3' end of th SL in pSLc2, and the 
Pst I site, located 28 bp from the SL in pSLc3, are each 
found in the corresponding X(SL)g2 and X(SL)g3 genomic 
clones. In both cases, these sites are in the 5' untranslated 
regions of the molecule. Thus the 3' portion of the leader 



sequence is derived from sequences abutting the struc- 
tural gene, while the 5' portion containing the SL is tran- 
scribed from elsewhere in the genome. 

Discussion 

The experiments described here demonstrate that the 35 
nucleotide SL is not unique to VSG mRNAs but is also 
present on many other trypanosome RIMAs. Estimates 
based on the frequency of SL-containing clones in cDNA 
libraries suggest that a substantial fraction of cytoplasmic 
mRNAs may carry this 5' sequence. Sequences cormon 
to many different cellular mRNAs have been described in 
other eucaryotic systems, but they are not 5'-terminal. For 
example, a 3' -terminal "suffix" sequence is found on ap- 
proximately 2% of all Drosophila mRNAs (Tchurikov et al. f 
1982). Similarly the 3' "Set 1" sequence © highly repre- 
sented on mouse mRNAs transcribed during the first half 
of embryogenesis (Murphy etal., 1983). Another sequence 
is found on the intron regions of many RNA species in the 
rat— the ID" sequences (Milner et al., 1984). This se- 
quence appears to be a marker for brain-specific tran- 
scripts (Sutcliffe et al., 1984). 

Each of these elements are repeated in theirrespective 
genomes, but unlike the tandemly repeated SL sequences, 
they are dispersed. They are encoded adjacent to or within 
the structural genes on whose transcripts they are found, 
again contrasting with the trypanosome SL, which is not 
detectably linked to the structural genes. The differences 
between the SL and other repetitive elements found on 
multiple transcripts, in particular regarding their locations 
in the genome and on the transcripts, indicate that different 
molecular mechanisms are responsible for their ubiquity. 

Perhaps more similar to the trypanosome SL is the case 
of the coronavirus leader. Coronawus RNAs are actually 
a series of nested transcripts that are 3' coterminal and 
extend various distances 5'. Each RNA, however, shares 
the same 5' -terminal leader sequence (Spaan et al., 1983). 

What mechanism accounts for the presence of the SL 
on trypanosome RNAs? Figure 6 depicts several alternative 
arrangements of SL and structural gene exons, and their 
resulting primary transcripts. Sequences just upstream 
from the SL rescnbte concensus sequences of eucaryotic 
RNA polymerase 0 promoters. The large tandem array of 
SL repeat units has therefore been hypothesized to provide 
multiple promoters for frequent initiation of VSG gene 
transcription (DeLange et al., 1983). However, the regula- 
tion of the transcription of the other genes whose tran- 
scripts contain the SL differs from that of VSG genes; 
those studied here are transcribed during both blood- 
stream and procydic stages, and appear to be much less 
abundant than VSG mRNAs. Could all of these genes, with 
such different characteristics, be using the same promoter 
(see Rgure 6A)? Recall that as with VSG genes, genomic 
sequences encoding the SL were not found within 3-12 ■ 
kb of the structural genes encoding the SL-containing 
RNAs. Nor do any ot these genes appear to be linked to 
one another. Thus if there is a single promoter used by all 
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Figure 6. Models of SL and Structural Gene Transcription 
In each case, the upper line depicts the genomic configuration and the 
fewer line shows the primary transcription' products. The large boxes 
indicate structural genes, with that labeled V being the active VSG gene. 
As indicated by the fine terminating in a circle, the active VSG gene ties 
adjacent to a telomere. The small srw.~ boxes are the genomic St 
sequences, either in the tandem array or in the orphons. 

(A) One continuous transcription unit for all genes that use the SL. (B) 
Separate transcription units for each structural gene that uses the SL (C) 
Discontinuous transcription, subsequent RNA joining. (0) Discontinuous 
transcription, SL RNA used as primer. 

these genes, the primary transcription unit must be 
hundreds of kilobases long, provide for stage-specific and 
abundant transcription of the distal VSG gene (transcribed 
VSG genes are adjacent to a telomere), and moderate 
transcription of many other genes throughout the trypan- 
osome life cycle. 

Another possibility is that each of these genes has its 
own SL promoter, located far upstream (Figure 63). Al- 
though each of the SL repeat units appear very similar;: we 
have detected a low level of sequence heterogeneity 
between repeat units (unpublished results), which might 
be hypothesized to allow for differences in the regulation 
of transcription. For example, the developmental^ regu- 
lated VSG gene could be colinear with (albeit distant from) 
the major airay of repeat units and the three to four SL 
orphons of our T. brucei stock could serve as promoters 
for other SL genes. However, there are many more than 
four non-VSG genes that employ the SL sequence. Even 
if there were several unidentified orphons or several tan- 
dem arrays, there do not appear to be sufficient unlinked 
SL transcription units for each gene to have its own. A 
combination of the two hypotheses, with several large 
transription units, one colinear with developmentalty reg- 
ulated genes, and one colinear with constitutively ex- 
pressed genes, could also be proposed. 

We believe it more likely that the SL repeat units are not 
linked to the structural genes, and that transcription is 
discontinuous. Two different hypotheses could be pro- 
posed. First, that the SL and structural gene sequences 
could be transcribed from separate DNA molecules, with 
the two RNA molecules then joined by intermolecular 
splicing or ligation (Figure 6C). Alternatively, the SL tran- 
script could aid in the initiation of structural gene transcrip- 



tion (Figure 6D) as host RNA fragments serve as primers 
for transcription of influenza virus genes (Plotch et al., 
1981). Interestingly, this reaction does not require homol- 
ogy between primer and virus sequences. The coronavirus 
leader sequence has also been postulated to function as 
a primer in discontinuous transcription (Spaan et a)., 1983). 
With regard to these latter hypotheses, one would predict 
the presence of small SL-containing RNAs. We have re- 
cently characterized an RNA species of approximately 135 
bp that hybridizes to the SL probe (Milhausen et al., 
submitted). 

It is clear that the role of the SL is not limited to VSG 
gene expression in African trypanosomes — it is founa in 
other trypanosomatids that do not undergo antigenic vari- 
ation (Nelson et al., 1984), and on RNAs derived from 
genes whose pattern of expression differs dramatically 
from that of VSG genes. Although as yet no structural 
genes, aside from those encoding VSGs or tubulins, have 
been studied in trypanosomes, identification of the function 
of the genes employing the SL may serve to clarify its rote 
in trypanosomatid gene expression. Alternatively, the SL 
may be present on virtually all trypanosome mRNAs, re- 
sulting from a requisite role in the transcription or RNA: 
processing machinery of the ceil. The possibility of a 
unique process critical to gene expression in these path- 
ogenic organisms may provide a targetfor rational chemo- r 
therapy of trypanosomiasis. 

Experimental Procedures 

Trypanosomes 

The IsTaR serodeme of Trypanosoma brucei brucei was employed for all 
studies (Stuart et al.. 1984). Celts of VAT 5 were converted to procyefic 
forms by in vitro culture (Hanas et al, 1975). DNA for Southern analyses 
and RNA for construction of cDNA libraries were isolated from VAT 5 ceQs 
and from procycHc ceBs cultivated for at least 2 months (Parsons et al.. 
1983; Milhausen et al.. 1983). Total RNA isolated from VAT 3 ceOs or 
procyclic cells derived from VAT 5 was a gift of Dr. Jean Feagin. 

cDNA end Genomic Clones 

The construction of cDNA libraries in pBr322 has been described (Parsons 
et al.. 1983). The generation of a Iforary of genomic DNA (from VAT 1,5 
cells) in bacteriophage X 1059 is described by Aline et al. (submitted). SL- 
corrtaintng cDNAs were detected by hybridization with 3Z P4abeled synthetic 
probe complementary to 22 nucleotides of the 35 nucleotide SL (see Figure 
2) using hybridization corxfitions previously determined to detect specific 
SL sequences in the tjenome (see below). 

Those cDNAs used for studies reported here were pTbSLcM(B), 
pTbSLc2-1(B). pTbSLc3-1(P). pTbSLc4-1(P), and pTbSLc5-1(P) and for 
convenience are designated in the text as pSLd through pSLc5, respec- 
tively. pSLcl and pSLc2 were isolated from the bloodstream-stage cDNA 
library white pSLc3, pSLc4. and pSLc5 were isolated from the procydic- 
stage library. Genomic dones corresponding to pSLcl are designated 
X(SL)g1-n (where n is the clone number); those corresponding to pSLc2 
are designated X(SL)g2-n, etc. Thirty-four X(SL)g1 clones, four X(SL)g2 
clones, ten X(SL)g3 clones, and ten X(SL)o4 clones were isolated. 

The portions of selected cDNAs that hybridized to the synthetic probe 
were subctoned into M13 vectors mp 8. mp 9, mp 18, or mp 19. The 
sequence was determined by the dideoxy chain termination method (San- 
ger et al., 1977), 

Hybridization Analyses 

Total RNA was fractionated by electrophoresis in 1 .4% agarose-formalde- 
hyde gels and transferred to nitrocellulose membranes (Milhausen et al., 
1983). Hybridizations were in 5x SSPE (1 x SSPE is 180 mM Nad, 10 mM 
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NaHaPO* 1 mM EOTA, pH 74). 0.2% sarkosyl, and 200 /*g/ml denatured 
salmon testis DMA for 16-30 hr at 66°C (for cONA probes) or 37°C (for the 
22-mer probe). The final stringency of posttiybrxiizatton washes was 0.3x 
SSC at 65°C forcDNA probes and 5x SSC at 37°CfatheSLprobe(lx 
SSC is 150 mM NaCI, 15 mM NaCitrate, pH 7.0). Southern analyses were 
performed as previously described (Parsons el al.. 1983). The final strin- 
gencies for post-hybrkJization washes were the same as those described 
above, except for VSQ cDNA riybridization, which employed a 0.1x SSC, 
65°C final wash. cDNAs were labeled with ^P by nick translation to a 
specific activity of approximately 10* cpm/jtg. The 22-mer, a gift of Dr. 
Phflftp Barr (Chiron Corporation), was labeled at its 5' end to a specific 
activity of approximately 5 x 10* cpm per pmote as previousty described 
(Nelson etaJ., 1983). 
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Abstract. The small (40S) subunit of eukaryotic ribo- 
somes is believed to bind initially at the capped 5-end 
of messenger RNA and then migrate, stopping at the 
first AUG codon in a favorable context for initiating 
translation. The first-AUG rule is not absolute, but 



there are rules for breaking the rule. Some anomalous 
observations that seemed to contradict the scanning 
mechanism now appear to be artifacts. A few genuine 
anomalies remain unexplained. 



The scanning mechanism for initiation of translation in 
eukaryotes was proposed 10 years ago (112). Support- 
ing evidence has accumulated at a slower rate than one 
might have wished, but a trickle sustained over 10 yr forms 
a decent-sized pond. Some remarkable experiments now be- 
ing carried out in yeasts are yielding important new insights 
about scanning and other aspects of initiation, and the power 
of the yeast system promises additional breakthroughs in the 
coming years. Thus, it seems a good time to review what 
we've learned so far about how eukaryotic ribosomes select 
a particular AUG codon as the start site for translation. 

Evidence from Higher Eukaryotes 

The scanning model states that the 40S ribosomal subunit 
(carrying Met-tRNAi mct and various initiation factors) binds 
initially at the 5'-end of mRNA and then migrates, stopping 
at the first AUG codon in a favorable context for initiating 
translation. The model thus posits that both position (prox- 
imity to the 5'-end) and context contribute to selection of the 
initiation site. The simplest evidence of the importance of 
position is that the "first-AUG rule" holds for some 90-95% 
of the hundreds of vertebrate mRNA sequences that have 
been analyzed (127). A detailed discussion of position ef- 
fects, along with an explanation for the 5-10% deviation 
from the first-AUG rule, will follow after a brief introduction 
to the context requirements for initiation. 

From a recent survey of 699 vertebrate mRNAs (127), 
GCCGCCfiCCAUGG emerged as the consensus sequence 
for initiation in higher eukaryotes, confirming and extending 
previously noted trends (117, 120). Site-directed mutagenesis 
experiments in which the expression of a cloned preproinsu- 
lin gene was monitored in transfected COS cells have 
confirmed the importance of G +4 as well as each of the con- 
sensus nucleotides from position -1 through -6 (122, 
126)'. The importance of context was demonstrated by tar- 
geting mutations to the vicinity of the AUG initiator codon 
for preproinsulin as well as by targeting an upstream, out-of- 

1. Numbering begins with the A of the AUG codon as position +1; nucleo- 
tides 5' to that site are assigned negative numbers. 



frame AUG codon; in the latter case the inhibitory effect of 
the upstream AUG codon (i.e., its ability to block the initia- 
tion of preproinsulin from a downstream site) increased as 
the surrounding nucleotides increasingly resembled the con- 
sensus sequence. The importance of a purine in position -3 
was confirmed by the discovery of a type of thalassemia in 
which a change in sequence, from CACC AUG to CCCCAUG, 
drastically impairs initiation of translation of a-globin (167). 
Experiments confirming (66, 92, 192, 248, 255) and exploit- 
ing the effects of context on initiation (169, 181, 222, 263) 2 
have recently been reported from other systems. A purine 
(usually A) in position -3 is the most highly conserved 
nucleotide in all eukaryotic mRNAs, including those of ver- 
tebrates, plants (81), and fungi (188); and a mutation in that 
position affects translation more profoundly than a point mu- 
tation anywhere else (122). Indeed, as long as there is a pu- 
rine in position -3, deviations from the rest of the consensus 
sequence only marginally impair initiation. In the absence 
of a purine in position -3, however, G +4 is essential for 
efficient translation (122) and the contributions of other near- 
by nucleotides can be detected (126). For practical purposes, 
an initiator codon can usually be designated "strong" or 
"weak" by considering only positions -3 and +4; I shall fol- 
low that convention in the rest of this paper. 

The scanning model predicts that proximity to the 5 '-end 
determines which AUG codon (in a good context) actually 
functions as the initiator codon. The nearly simultaneous 
discoveries of the m7G cap (232) and of "silent" 3'-cistrons 
in many viral mRNAs (243) constituted the first strong evi- 
dence that eukaryotic ribosomes are somehow restricted to 
initiating near the 5'-end. Some of the early in vitro evidence 
for silent 3 -cistrons has recently been confirmed in vivo (70, 
281). In lieu of a scanning mechanism to explain the 5' re- 
striction in viral mRNAs, one might have postulated that 
the primary and/or secondary structure around a particular 



2. In reference 222, although improving the context around the AUG codon 
increased the number of transformants 10-fold, the yield of protein could 
not be elevated above a threshold that was set by inefficiency at some later 
step in expression. 
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AUG triplet (which just happened to be 5' proximal) caused 
it to be the preferred initiation site. To rigorously test the im- 
portance of position, therefore, a plasmid with four identical 
copies of the preproinsulin "ribosome binding site" was con- 
structed and tested (118); the outcome, exclusive use of the 
first site in the tandem array, strongly supports a scanning 
mechanism. The scanning model predicts that, upon in- 
troducing an adventitious upstream AUG codon (in a good 
context), initiation should shift to the upstream site. Thus, 
if an upstream AUG codon is in the wrong reading frame, 
it should, and does, depress the yield of protein from the nor- 
mal site (10, 39, 70, 105, 144, 156, 198, 214, 282) . If a strong 
AUG codon is introduced upstream from, and in the same 
reading frame as, the normal initiator codon, the result is a 
stretched polypeptide with an NH 2 -terminal amino acid ex- 
tension. This has been documented with laboratory con- 
structs (87, 118, 132, 148, 228, 244) and the same principle 
operates with some cellular and viral genes that use two pro- 
moters for transcription. In the latter cases, the longer tran- 
script has an extra AUG codon (in a good context) upstream 
from the start of the shorter transcript; ribosomes initiate ex- 
clusively at the first AUG codon in each form of mRNA, 
producing "long" and "short" versions of the encoded protein 
that often mediate different biological functions (130). (If the 
5'-proximal AUG codon in the longer mRNA were in a less 
favorable context, both versions of the protein could be trans- 
lated from a single mRNA, as described below.) 

Besides the wealth of evidence relating to position effects, 
here is some of the other evidence that supports and defines 
the scanning mechanism: (a) That the 40S ribosomal sub- 
unit/factor complex can migrate was deduced from the prop- 
erties of complexes formed between reovirus mRNAs and 
wheat germ ribosomes in the presence of edeine (131). Those 
observations were subsequently confirmed with extracts 
from mammalian cells (35). (b) Other in vitro experiments 
revealed that sliding requires ATP hydrolysis and that, in the 
absence of ATP, 40S subunits are trapped upstream from the 
AUG codon, as the scanning model predicts (115). (c) The 
inability of eukaryotic ribosomes to bind to circular mRNAs 
is strong evidence against direct binding at the AUG start site 
(109, 113). That site was identical in the linear and circular 
forms of each template, but only the linear form could bind 
to wheat germ or reticulocyte ribosomes. Control experi- 
ments showed that both linear and circular templates bound 
to bacterial ribosomes, emphasizing the fundamental differ- 
ence between prokaryotes and eukaryotes in the mechanism 
of initiation (119). (d) Translation of various mRNAs is in- 
hibited when a hairpin structure, or a DNA-RNA hybrid, or 
a hybrid with anti-sense RNA is introduced near the 5'-end 
of the mRNA in a way that does not occlude the AUG codon 
or the m7G cap (36, 75, 123). The simplest interpretation, 
albeit not yet verified by footprinting, is that 40S ribosomal 
subunits can still bind to such mRNAs but cannot migrate be- 
yond the duplex barrier. 

Evidence from Yeasts 

Yeast mRNA sequences resemble those of beasts in that 
translation begins at the first AUG codon in 95 % of the genes 
examined (29). The rare occurrence of upstream AUG 
codons is neither an accident, nor a problem, but a means 
to regulate the translation of some interesting genes (168, 
250, 261, 277). Two yeast genes have been subjected to ex- 



haustive genetic probing, the results of which strongly _ 
port a scanning mechanism. One set of experiments begajj 
with a CYC 7 allele in which the normal AUG initiator codoii 
had been inactivated (235). From the structure of revertang 
that had regained cytochrome c function, it appeared as if ag 
AUG codon introduced anywhere within a stretch of 37 
nucleotides could function in initiation. Donahue and Ciglg 
(44) used a complementary approach to reach the same basic 
conclusion. They began with a specially designed HIS4 at 
lele in which initiation at the 5'-proximal AUG codon gave 
a His" phenotype, while initiation from the next AUG down- 
stream restored gene expression. That unique genetic be- 
havior enabled them to select for mutations that reduced oi 
abolished ribosomal recognition of the first AUG codon. The 
only point mutations identified in the search were alterations 
in the AUG codon itself, elegantly confirming the impor- 
tance of "position" and suggesting that, if flanking sequences' 
affect recognition of the AUG codon in yeast, the effects must 
be subtle. ^ 

When the latter question was addressed directly, by mutat- 
ing nucleotides in the vicinity of the AUG codon, only mod 
est effects (twofold or less) were found (6, 31, 277, 287). In- 
deed, although yeast mRNA sequences almost always have 
an A in position -3, the rest of the vertebrate consensus se- 
quence is not evident (29), suggesting a different role for (or 
different sensitivity to) context in the two systems. Even the 
conserved A~ 3 might simply reflect the overall A richness 
of 5 -noncoding sequences in lower eukaryotes. The signifi- 
cance of the A-rich, G-deficient leader sequences on mRNAs 
from yeasts and other lower eukaryotes (26, 48, 51, 104, 172, 
208, 245) is unknown, but an inevitable consequence is that 
such leader sequences lack extensive secondary structure, 
which seems to be more of an impediment to translation in 
yeasts (6, 31) than in higher eukaryotes (123). 

Other recent experiments from Donahue's laboratory have 
yielded one remarkable new insight and validated one criti- 
cal old assumption about the mechanism of initiation of 
translation. This time Donahue's group began by inactivating 
the AUG initiator codon in the yeast HIS4 gene. Their subse- 
quent search for second-site suppressor mutations led them 
to eIF2, the protein factor that escorts Met-tRNAi met onto 
the ribosome. They found that a mutation in the 0 subunit 
of eIF2 suppressed the His~ phenotype by allowing ribo- 
somes to initiate at the first UUG codon in HIS4 mRNA (45). 
Thus, eIF2 is an active participant (and the only protein fac- 
tor so far implicated by genetic criteria!) in the mRNA 
binding-and-scanning step of initiation. The elegant follow- 
up experiment was to change the anticodon sequence from 
3'-UAC-5' to 3'-UCC-5' in one of the tRNAr' genes of S. 
cerevisiae; the mutant form of tRNAi mct directed ribosomes 
to initiate at (the first) AGG instead of the usual AUG codon 
(30). This is direct proof that the initiator codon is recog- 
nized primarily by base pairing with the anticodon in Met- 
tRNAj™ 1 , as had always been (only) supposed. And the ex- 
periment is compelling proof of scanning. 

Explicable Exceptions to the First-AUG Rule 

There are a number of well-characterized viral and cellular 
mRNAs in which translation is not limited to the AUG codon 
nearest the 5'-end, but even these "exceptional" mRNAs ad- 
here to rules that are consistent with a scanning mechanism. 
For example, although initiation is not restricted to the first 
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AUG codon in the examples discussed below, initiation is at 
least limited to AUG codons in the vicinity of the 5'-end. In 
no case do eukaryotic ribosomes initiate de novo in the mid- 
dle of an mRNA. In several cases in which one or more small 
.open reading frames (ORFs) 3 precede the major ORF, the 
"small upstream ORFs are translated (65, 76, 91, 101, 277). 
LThe "problem," therefore, is not that the 5'-proximal AUG 
codon is missed but that it is not used exclusively. 

Initiation at downstream AUG codons occurs, not haphaz- 
ardly, but under three specific conditions: (a) When there are 
fewer than 10 nucleotides between the cap and the first AUG 
codon, ribosomes may initiate at the first and second AUG 
; codons (99, 199, 251). This rule, deduced from the behavior 
of natural mRNAs, has not yet been verified by systematic 
experiments; it is supported, however, by some results of 
t manipulating late leader sequences on SV-40 mRNAs (39, 
72). (b) When the first AUG codon lies in an unfavorable 
context for initiation (i.e., when position -3 is C or U; or 
when position -3 is G and position +4 is not G), "leaky 
scanning" enables some ribosomes to reach and initiate at the 
second AUG codon. The hypothesis (125) that leaky scan- 
ning underlies the ability of Afunctional viral mRNAs to di- 
rect the synthesis of two separately initiated proteins is sup- 
ported by the effects of mutations on the translation of two 
such mRNAs. Thus, Sedman and Mertz (228) probed the 
translation of SV-40 late 19S mRNA by introducing muta- 
tions near the 5'-end, and found that the relative yields of 
VP2 and VP3 (the second initiated downstream from the 
first) varied in accordance with the scanning rules. Mutagen- 
esis of the Sendai virus P/C gene (where proteins P and C 
are translated from one mRNA in different, overlapping 
reading frames) also gave results consistent with leaky scan- 
ning, with the interesting twist that initiation occurs from 
three sites in that message: the first ACG codon functions be- 
cause it lies in an excellent context (GCCACGG), but func- 
tions poorly because it is not AUG; the next AUG codon in 
a rather poor context (CGCAUGG) functions inefficiently; 
thus most 40S subunits advance to and initiate at the third 
start site, AAG AUG C (37, 74). Thus far, 17 viral mRNAs 
(reviewed in reference 124) have been shown to produce two, 
or rarely three, overlapping proteins by initiating at a weak 
upstream, as well as the next downstream, AUG codon. An 
important addition to the list of Afunctional mRNAs is the 
pX transcript of human T-cell leukemia virus type I, which 
directs synthesis of both p27 (from the first, weak AUG 
codon) and p40 (173, 241). The recently determined se- 
quences of turnip yellow mosaic virus genomic RNA (165) 
and one of the simian rotavirus SA11 gene segments (164) 
suggest that they, too, should produce two proteins by leaky 
scanning. The only viral mRNA known to produce two pro- 
teins from overlapping reading frames, the first of which in- 
itiates with an anomalously strong AUG codon, is the 
NA/NB mRNA of influenza B virus (234). The explanation 
in that case might involve slippage on the run of A residues 
flanking the first AUG codon. (c) The third condition that al- 
lows access to internal AUG codons is reinitiation. When an 
AUG codon upstream from the start of the major protein cod- 
ing sequence lies in a favorable context, thereby precluding 
leaky scanning, ribosomes can still reach the downstream 
initiation site provided that a terminator codon occurs in- 



3. Abbreviation used in this paper: ORF, open reading frame. 



frame with the first AUG codon and upstream from the sec- 
ond (121, 128, 148). In such cases the first (invariably small) 
ORF is translated (65, 76, 91, 101, 277), after which 40S 
ribosomal subunits apparently resume scanning and reiniti- 
ate farther downstream. (If initiation factors dissociate from 
the 40S subunit slowly rather than instantaneously, upon the 
addition of a 60S subunit and commencement of peptide 
bond formation, reinitiation might be possible after the 
translation of an upstream "minicistron," as has been ob- 
served, but not after the translation of a full-sized cistron. 
That as-yet-untested hypothesis would explain why 3'-cis- 
trons in viral mRNAs are usually silent [243].) The influence 
of flanking sequences on AUG codon recognition follows the 
same hierarchy in the reinitiation mode as in the primary 
scanning mode (122), but not every downstream AUG codon 
in a good context is an efficient site for reinitiation. The 
efficiency of reinitiation at a downstream AUG codon stead- 
ily improves, for example, as its distance from the upstream 
ORF increases. Consequently, reinitiation is not limited 
to the first strong AUG codon after the 5 -proximal ORF. 
(See reference 128 for evidence and further discussion of 
this.) Because reinitiation is usually inefficient with natu- 
ral mRNAs, the presence of short upstream ORFs usually 
reduces translation of the downstream ORF (69, 72, 102, 
185), albeit not as severely as were there no terminator 
codon between the upstream AUG codon and the down- 
stream ORF. Failure to consider the contribution of leaky 
scanning probably explains some reports (193, 262) in which 
the apparent requirements for reinitiation differed from what 
I have described here. 

Some Harder Cases 

The popular press (88) has recently announced that picor- 
naviruses "break the rules" by allowing a 40S ribosomal 
subunit to bind directly to an internal site (somewhere up- 
stream from the start of the major ORF) in lieu of the usual 
end-dependent mode of entry. The key experiments (196) in- 
volve the translation of a dicistronic transcript of the form 
TK-PV-CAT, where the 736-nucleotide poliovirus leader se- 
quence (PV) separates the 5 '-proximal thymidine kinase gene 
(TK) from the 3 -proximal chloramphenicol acetyltransfer- 
ase gene (CAT). COS cells transfected with the dicistronic 
vector clearly produced some CAT protein, but that result 
hardly warrants the conclusion that the presence of the polio- 
virus leader sequence allows the efficient translation of CAT 
by direct internal initiation. One problem is that efficiency 
was claimed without having been demonstrated; i.e., it was 
not shown how much CAT protein was produced in vivo from 
the dicistronic vector relative to a monocistronic CAT tran- 
script that bears a normal leader sequence. 4 The most seri- 
ous deficiency, however, is that the Northern blot which was 
offered as proof that the dicistronic transcript is the only form 
of CAT mRNA in transfected cells was much too feint to 
prove the point. When translation of dicistronic mRNAs was 
studied in vitro, on the other hand, the unbound mRNA pool 
was found to be completely degraded after only 10 min of 
incubation, which makes it hard to believe that many, if any, 
transcripts were intact after 60 min, when the CAT yield was 



4. Indeed, the poliovirus leader sequence does not seem to support efficient 
translation even when it is at the 5'-end of a transcript. See the change in 
scale in Fig. 2 of reference 266. 
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measured. In the absence of rigorous proof that dicistronic 
mRNAs are the only transcripts available for translation, the 
recent picornavirus experiments (90; 196) are no different 
from older in vitro experiments (see below) in which the ap- 
pearance of internal initiation has turned out to be an artifact. 
That some dicistronic mRNA was associated with polysomes 
(196) is not surprising, since the 5'-proximal TK cistron 
would be functional. To prove that the 3'-proximal CAT cis- 
tron could also function, the authors should have immuno- 
precipitated polysomes engaged in CAT synthesis and asked 
whether all (or any!) of the mRNA thus selected was dicis- 
tronic. In capable hands such experiments can yield answers 
(8). In a different approach, Pelletier and Sonenberg (196) 
showed that their dicistronic transcript formed "disomes" in 
the presence of sparsomycin, from which they assumed, 
without direct evidence, 'that one of the two ribosomes was 
bound to the PV sequence in the middle of the transcript. 
Years ago similar ideas were entertained with other viral 
mRNAs that formed disomes under conditions of initiation 
(201); but, in every instance where the ribosome-protected 
sites were sequenced, both sites mapped to the 5'-noncoding 
sequence (2, 202, 267). Thus the mere formation of disomes 
in sparsomycin-inhibited extracts is not evidence of a func- 
tional 3'-cistron. In view of all these shortcomings, all we yet 
know about the translation of poliovirus mRNA is that it oc- 
curs in the absence of the usual m7G cap and despite the bur- 
den of eight (mostly weak) upstream AUG codons (207). 

Turning from viral to cellular mRNAs, the 5-10% of ver- 
tebrate mRNAs 5 that have upstream AUG codons are an in- 
teresting, nonrandom set. Many protooncogenes and growth- 
control genes produce mRNAs with upstream AUG codons 
(41, 68, 189, 215, 273, 283, 285) 6 which may be lost during 
the'rearrangements that accompany activation (156 and refer- 
ences therein). The cDNAs that have been characterized 
from homeobox genes are also likely to have upstream AUG 
codons (24, 239), although it is not clear that those cDNAs 
correspond to functional mRNAs. The transcript that corre- 
sponds to one AUG-burdened homeobox cDNA, for exam- 
ple, is restricted to the nucleus (170). Cellular genes that 
produce mRNAs with upstream AUG codons often use alter- 
native promoters and/or splice sites to generate supplemen- 
tary transcripts in which the leader sequences are less prob- 
lematical (68, 155, 179, 197, 210, 212, 223, 259, 269). Indeed, 
the 5' variability is sometimes so extensive that no two cDNAs 
from a given gene have the same 5' noncoding sequence (145, 
155)! This underscores the need to distinguish between func- 
tional and nonfunctional (or minimally functional) cDNAs 
and mRNAs, a difficult problem that only a few investigators 
have tackled (3, 204). It might be mentioned in passing that, 
among growth control as well as housekeeping genes, G-C- 
rich leader sequences are much more pervasive, and thus 
more of a potential problem, than upstream AUG codons (23, 
53, 77, 89, 134, 136, 138, 163, 166, 178, 191, 229, 231, 252, 
270, 280). In the case of the c-sw/PDGF-2 mRNA, which has 
both upstream AUG codons and a G-C-rich leader sequence, 
it is primarily the formation of secondary structure (implied 



by the G-C richness) that restricts translation in vivo (209). ' 
This is noteworthy because the inhibitory effects of secondary 
structure may be susceptible to environmental modulation 
(129) while there is, as yet, no evidence that the inhibitory ef- 
fects of upstream AUG codons are regulatable in vertebrates, 
as they are in the GCN4 gene of yeast (86, 220, 267a). In ver- 
tebrates, the solution to upstream AUG codons is to get rid 
of them (68, 155, 179, 197, 210, 212, 223, 259, 269). 

Lessons from cDNA Irregularities 
Atypical or erroneous cDNA sequences have sometimes 
been mistaken as evidence against the generality of the scan- 
ning mechanism. Among cellular mRNAs that were initially 
reported to contain a slew of upstream AUG codons, many 
of the worst offenders have been exonerated as more data has 
emerged. For example, some long, AUG-burdened leader 
sequences have been recognized belatedly as errors in clon-, 
ing or sequencing (58, 97, 183, 268; 52 corrected in 286; 59 
corrected in 194; 110 corected in 153; 5 corrected in 56; 34 
corrected in 103; 78 corrected in 139). 6 In some cases the 
error was simply that the cloned cDNA did not include the 
entire coding sequence: an internal AUG codon was mis- 
takenly assumed to be the site of initiation, and therefore sev- 
eral upstream AUG codons, actually part of the coding se- 
quence, were thought to burden the 5'-noncoding sequence 
(9 corrected in 79; 21 corrected in 22; 28 corrected in 94; 
141 corrected in 142; 157 corrected in 254; 176 corrected in 
175). The spurious upstream AUG codons in some cDNA se- 
quences reflect derivation of the cDNA from a minor mRNA 
species that has an atypically long leader; the bulk of the 
transcripts from the same gene were shown to have a much 
shorter leader sequence, and no upstream AUG codons (95, 
111, 151 corrected in 71). 6 As cloning efforts have gradually 
turned toward complicated regulatory genes, it has become 
fairly common to find cDNAs that correspond to incom- 
pletely processed transcripts (42, 63, 96, 146, 216, 225, 257). 
Thus, there are cases in which a bevy of out-of-frame AUG 
codons near the 5'-end of a cDNA sequence actually reside 
in an intron, which is not present, of course, in the functional 
mRNA (230, 236, 247, 272, 147 corrected in 64). For some 
Drosophila transcripts the excision of a 5'-intron, and conse- 
quent activation of translation, are developmentally regu- 
lated (17). If regulated (or merely inefficient) splicing is more 
widespread than we realize at present, most of the still-prob- 
lematical cDNA sequences with multiple upstream AUG 
codons (e.g., 57, 83, 108, 177, 186, 206, 256) might eventu- 
ally be traced to intron-containing pre-mRNAs. There are 
some mammalian cDNA sequences with AUG-burdened 
leader sequences that have not yet been formally recognized 
as introns, but that possibility is consistent with their local- 
ization in the nucleus (20, 63, 170); or their inability to be 
translated unless the upstream AUG codons are removed 
(171, 174, 187); or the presence of a typical 3' splice junction 
motif at the point of divergence between two cDNA sequences 
(compare 154 with 260, 224 with 159, 135 with 152, and 
cDNAs a and b in reference 149); or the fact that an intron 



5. This number, which comes from reference 127, is probably inflated 
reasons described in the next section of the text. 

6. See reference 127 for additional documentation. 



7. In other cases (47, 107), the regulatory effect of leader sequences was 
demonstrated by deletion mutagenesis rather than site-directed mutagenesis; 
thus it is not clear whether an upstream AUG codon or the potential for sec- 
ondary structure accounts for the observed inefficient translation of the 
wild-type sequence. 
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interrupts the 5'-noncoding sequence in other members of the 
gene family (compare 133 with 18, and 205 with 19). Many 
of the anomalous deductions about upstream AUG codons in 
mRNA sequences are favored, unfortunately, by the common 
practice of selecting and sequencing the longest cDNA clone! 
The main point here is not that cellular mRNA leader se- 
quences never have upstream AUG codons but that such se- 
quences are not nearly as common as superficial reading of 
the literature would suggest. When upstream AUG codons 
do occur they are a red flag, a clue to expect some sort of 
regulation (such as promoter switching or regulated splicing) 
or, at least, inefficient translation of the gene in question. 

Artifacts of Cell-free Translation Systems 

The results of some in vitro translation experiments have 
been taken as evidence against the generality of scanning. 
Five examples are often cited as evidence that eukaryotic 
ribosomes can bind directly to a site in the interior of a mes- 
sage without having traversed the upstream sequence. Rice 
et al. (213) have discussed the evidence for "internal initia- 
tion" in flaviviruses and have suggested credible alternative 
interpretations. As I cannot improve on their cogent discus- 
sion, I will simply recommend it to the reader and move on. 
For each of the other four examples, a synopsis of the argu- 
ments for and against internal initation follows. 

(a) An SP6 vector-derived transcript corresponding to the 
adenovirus DNA polymerase gene directed the in vitro syn- 
thesis of the 120K (nearly full-sized) polymerase and a 62K 
polypeptide that mapped to the 3 -end of the pol coding se- 
quence (80). Because cDNA corresponding to the 5 -portion 
of pol arrested the translation of the 120K but not the 62K 
product, the authors suggested that there is an independent 
ribosome-entry site at the midpoint of the pol ORF, and that 
the 62K protein was initiated internally. A less heretical ex- 
planation is that nuclease attack might have generated an 
mRNA fragment in which the initiator codon for 62K was 
near the 5 -end. The authors considered that explanation less 
likely because the translation results did not change when 
RNasin (an inhibitor of RNase) was included, and because 
32 P-labeled mRNA was only slightly degraded during the 
first 15 min of incubation in the reticulocyte extract. In retro- 
spect, however, either RNA cleavage or some other artifact 
must have occurred in the in vitro experiments, because the 
62K protein was not synthesized in vivo from a plasmid that 
produced an abundance of functional 140K DNA polymer- 
ase (237). 

(b) When vesicular stomatitis virus mRNA P was trans- 
lated in a reticulocyte extract, the products were the full- 
sized P protein and a 7K COOH-terminal fragment thereof 
(84). The 7K polypeptide was attributed to internal initiation 
based on hybrid arrest experiments; i.e., a cDNA fragment 
complementary to the 5 -end of mRNA-P abolished transla- 
tion of P but not the 7K protein. RNA cleavage was consid- 
ered a less likely explanation because RNasin was present 
during the incubation and because the relative yield of 7K 
did not increase with time, as it might have were its template 
a degraded RNA. The second argument is weak, however, 
because one cleavage event might activate an mRNA frag- 
ment for 7K translation while the next clip, within the 7K 
coding sequence, might inactivate it. RNasin is not adequate 
insurance, as proven by the adenovirus story. The fact that 



cap analogues did not inhibit 7K translation (85) is equally 
consistent with internal initiation or initiation at the 5'-end 
of a broken template. A strong hint that the 7K protein is an 
artifact came from an experiment in which a transcript 
representing only the 3-portion of the P gene (derived by 
subcloning that region into an SP6 vector) was translated in 
vitro: three small polypeptides were made, and the two "non- 
physiological" products were more abundant than the Authen- 
tic" 7K band (84). Virus-infected cells contain barely detect- 
able amounts of a protein that does not exactly comigrate 
with the 7K in vitro band (84) and might itself be a degrada- 
tion product. 

(c) In the case of infectious pancreatic necrosis virus, the 
claim (162) is that genomic RNA-A directs the independent 
synthesis of three major proteins: VP2, NS, and VP3. There 
is no absolute need to postulate three initiation events inas- 
much as the sequence of genome segment A reveals a single 
ORF that encodes VP2-NS-VP3 as a fusion protein (49), and 
a protease activity that maps with NS is able to release the 
mature proteins from the common precursor (50). Thus the 
speculation is that, in addition to the polyprotein mode of 
translation, NS and VP3 can be translated by a second mech- 
anism which involves internal initiation. The main support- 
ing evidence seems to be that, during a short pulse with 
[ 35 S]methionine in vitro, VP2, NS, and VP3 acquired label 
simultaneously (162). But that observation is equally consis- 
tent with independent internal initiation of three proteins 
from one mRNA or translation of each protein from an inde- 
pendent template, generated by RNA degradation. In addi- 
tion to VP2, NS, and VP3, in vitro translation of RNA-A pro- 
duced countless other polypeptides that are never seen in 
vivo (50, 162), which is strong reason to suspect an in vitro 
artifact. The NH 2 -terminal amino acid sequence of virion- 
derived VP3 has not yet been analyzed to ascertain the pres- 
ence of either methionine, which would be consistent with 
internal initiation, or a protease-recognition sequence, which 
would implicate a polyprotein as the sole source of VP3 in 
vivo. Lacking such evidence, the argument for internal initi- 
ation is weak. 

(d) The last example is poliovirus. When RNA from viri- 
ons was translated in a reticulocyte lysate, the earliest de- 
tected products mapped to the 3' portion of the genome (region 
P3), suggesting internal initiation site(s) (200). In support of 
that idea, A^-formyl[ 35 S]methionine (a marker of initiation) 
was incorporated into polypeptides from region P3 (46). The 
question is whether internal start sites are accessible to ribo- 
somes in intact poliovirus mRNA or whether RNA frag- 
ments are the functional templates. The anomalous internal 
sites detected in the reticulocyte system did not function in 
HeLa cell extracts, nor when an aliquot of HeLa cell extract 
was added to the reticulocyte lysate. The authors' interpreta- 
tion was that HeLa cell extracts contain undefined factors 
that promote 5' initiation, and "the deficiency (of those fac- 
tors in the reticulocyte system) resulted in the ability of ribo- 
somes to initiate translation on internal sequences" (46). One 
might think that deficiency of a required component would 
preclude translation, rather than endow ribosomes with a 
novel power; but that is debatable. The fact is that no virus- 
promoting initiation factor has yet been purified from HeLa 
cells, but HeLa extracts have been shown to contain an RNase 
inhibitor (249)! Finally, an experiment that might have de- 
tected internal initiation in vivo failed to do so. The experi- 
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ment involved poliovirus mutants in which small changes in 
the, 5' non-coding sequence depressed translation from the 
normal 5' proximal site (265). Were there an independent ini- 
tiation site in the interior of the poliovirus genome, it should 
have remained functional; but the residual level of translation 
in mutant-infected cells showed no enrichment for P3 prod- 
ucts. Thus, although the mechanism of initiation at the 5' 
proximal site in poliovirus mRNA remains unclear, the exis- 
tence of an internal initiation site in the 3' third of the genome 
can almost certainly be dismissed as an artifact. 

Curran and Kolakofsky (38) have interpreted some recent 
results from the Sendai virus system as evidence that ribo- 
somes bind to the 5' end of the W P" transcript and jump some 
1,500 nucleotides to the start of the "X" ORF, thus translating 
the X protein by a cap-dependent, scanning-independent 
process. Their hypothesis is based largely on the ability of 
cap analogues to inhibit X synthesis, but the hypothesis is 
contradicted by the virtual absence of X translation in vitro 
unless the P transcript is cleaved! In view of the many prece- 
dents for activation of internal initiation sites by mRNA 
cleavage (13, 116, 140, 195) and for inhibition of translation 
of uncapped transcripts by cap analogues (15, 61, 100, 227, 
246), the interpretation offered by Curran and Kolakofsky 
seems unwarranted. 

The main point here is not to debunk a few spurious claims 
against the generality of scanning but to illustrate the need 
for caution whenever cell-free systems are used to study 
translation. The tendency to see "internal initiation" more of- 
ten in reticulocyte (12, 14, 40, 73, 180, 238, 264, 271, 279, 
284) than in wheat germ translation systems might simply 
reflect the greater ease of translating broken (hence un- 
capped) mRNAs in the former system. A final caution is that 
initiation at non-AUG codons occurs far more efficiently in 
vitro than in vivo (4, 74, 192, 275) and therefore cell-free sys- 
tems are not a reliable way to explore the rare, interesting sit- 
uations in which eukaryotic ribosomes seem to initiate at 
codons other than AUG (10, 11, 37, 74). The usual failure of 
eukaryotic ribosomes to initiate at non-AUG codons in vivo 
is illustrated by the ability of point mutations in the AUG ini- 
tiator codon to abolish gene expression (27, 32, 44, 72, 182, 
203, 211, 217, 258). Conversely, bacterial genes that initiate 
with a GUG codon require the substitution of an AUG codon 
for successful expression in mammalian cells (137, 240). 

Putting the Steps Together 

Here is a brief statement of what we know and what we have 
yet to learn about the first three steps in initiation: binding 
of the 40S-ribosome/Met-tRNA/factor complex to mRNA; 
scanning; and recognition of the AUG initiator codon. 

Stepl 

The ubiquitous m7G cap and the associated cap-binding pro- 
tein(s) (233) explain the predilection of eukaryotic ribo- 
somes to engage mRNAs at the 5 -end. The initiation mecha- 
nism is end dependent even in those uncommon instances in 
which a cap is absent, however, prompting the idea that the 
40S subunit might thread onto the 5 -end of mRNA (113). 
Microscopic data obtained with a new image processing 
technique indeed suggests the possibility of a channel run- 
ning through the neck region of the 40S ribosomal subunit 
(60) which could be the needle's eye. (If the scanning mecha- 



nism were really a threading mechanism, the ability of 40S 
ribosomal subunits to hold on and reinitiate downstream 
might almost have been expected!) Apart from cap-recogni- 
tion factor(s), we do not understand the precise function of 
any of the initiation factors that mediate early step(s) in the 
binding of mRNA to ribosomes. The remarkable finding (45) 
that a point mutation in eIF2 (the Met-tRNAi mct -binding fac- 
tor) affects where ribosomes initiate on mRNA illustrates the 
potential for surprises in this area. 

Step 2 

All we yet know about the mechanism of 40S subunit loco- 
motion is that ATP is required (115), in common with other 
proteins that bind to nucleic acids and then (in an equally un- 
known manner) slide (25, 54). Scanning seems to be facili- 
tated by something in the cytoplasm of mammalian cells, ei- 
ther a soluble protein or a 40S ribosome-associated activity, 
that has considerable ability to unwind duplex structures in 
the 5'-noncoding region of mRNA (123). The 40S subunit 
apparently migrates linearly, as evidenced by the ability of 
an upstream AUG codon to completely suppress initiation 
from a downstream site, even when the downstream AUG 
codon occurs in the same context as, and only five nucleo- 
tides beyond, the first AUG triplet (Fig. 7 in reference 121). 
The inability of ribosomes to "jump" a hairpin structure that 
is too stable to be melted is further evidence of the linearity 
of scanning (123). We do not know whether the 40S ribo- 
some advances nucleotide-by-nucleotide or triplet-by-triplet; 
but, if it is the latter, each entering ribosome must pick at 
random any one of the three possible frames. If scanning 
were uniquely phased, a one- or two-nucleotide insertion be- 
tween the cap and the AUG codon should drastically impair 
translation; in fact, such mutations are usually innocuous. 
An interesting possibility is that 40S ribosomal subunits 
might be nudged into the correct phase by the GCC or ACC 
motif that immediately precedes the AUG codon in verte- 
brate mRNAs. That would explain the ability of gCC to en- 
hance initiation when the purine occurs in position -3 or 
(less effectively) position -6, but not when the gCC motif 
is shifted out-of-phase with respect to the AUG codon (122, 
126). The observation that increasing the length of the leader 
sequence never impairs (and sometimes enhances) transla- 
tion (129) suggests that, at least in the absence of secondary 
structure, scanning is not the rate limiting step in initiation. 

Step 3 

The migrating 40S ribosomal subunit stalls at the first AUG 
codon, which is recognized in large part by base pairing with 
the anticodon in Met-tRNA;™' (30). The stop-scanning 
step, and hence selection of the initiator codon, is susceptible 
to modulation, however: by context, at least in vertebrates 
(122, 126); by mutant forms of yeast eIF2 that compensate 
for (stabilize?) a weakened codon/anticodon interaction (45); 
by the antibiotic edeine (131); and by varying the concentra- 
tion of Mg 2+ and other ions in cell-free translation systems 
(114, 192, 221, 275). 

Implications, Limitations, and Alternatives 

In addition to correctly predicting the start sites for transla- 
tion in the majority of viral and cellular mRNAs, the scan- 
ning model has informed the design of a variety of experi- 
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ments, including the construction of mammalian vectors that 
can initiate translation in all three reading frames (169) and 
the development of a "translational assay" for monitoring 
immunoglobulin gene rearrangements (55). The scanning 
mechanism explains the considerable effort expended by 
viruses (125) and some cells (93) to convert polycistronic 
transcripts into translatable (monocistronic) mRNAs; ration- 
alizes an alternative strategy where several (up to seven!) 
cistrons are fused to generate some remarkable multifunc- 
tional proteins (62, 82, 98, 158); and justifies the observed 
inefficient translation of some critical genes, including some 
protooncogenes (156, 204, 209). The inhibitory effect of up- 
stream AUG codons has been incorporated into an interest- 
ing model regarding the mechanism of allelic exclusion (143). 

The scanning model is compatible with various modes of 
negative regulation, such as inhibition by upstream AUG 
codons (86, 267a), or by a repressor protein (219), or by sec- 
ondary structure within the 5'-noncoding region; but the 
scanning mechanism seems incompatible with intricate 
schemes for enhancing the translation of specific mRNAs. 
Indeed, apart from the ubiquitous m7G cap and the con- 
served sequence around the initiator codon, the only other 
defined feature that has been shown to increase translational 
efficiency in eukaryotes is a long, unstructured 5 -sequence 
(129). Although leader-shuffling experiments have documented 
the ability of certain cellular and viral 5'-sequences to sup- 
port efficient translation (16, 67, 106, 150, 253, 278), the ap- 
plication of mutagenesis techniques has consistently failed to 
pinpoint an effector motif within such leader sequences (160, 
242). Thus, the data so far support the prediction (125) that, 
beyond the m7G cap and a favorable context near the AUG 
codon, what makes a "good" 5 -noncoding sequence is mere- 
ly the absence of any unfavorable features. 

An alternative to the three-step initiation mechanism de- 
scribed above is that 40S ribosomal subunits might bind 
directly to an internal site in some rare mRNAs. Although 
no credible evidence yet supports that possibility, it can 
never be ruled out. Neither will it ever become a fact unless 
the deficiencies in current experiments are recognized. One 
cannot ignore mRNA degradation in vitro as if it were a mi- 
nor irritation. In vivo experiments have other limitations. 
The most carefully executed Northern assays sometimes fail 
to detect transcripts which, genetic evidence tells us, must 
be present and functional in cells (10). When it comes to 
reporter genes, the more sensitive the assay for protein ex- 
pression, the harder it is to pinpoint the functional mRNA. 
Thus, a recent report of supposedly efficient expression of 
adenosine deaminase from the 3 -end of a dicistronic vector 
(161) was interpreted more cautiously when the high specific 
activity of adenosine deaminase was taken into account (184). 

The existence of credible rules for initiation calls attention 
to occasional mRNAs that break the rules. In fact, very few 
viral mRNAs remain incompletely understood. It seemed for 
a while as if poxviruses might follow different rules for trans- 
lation inasmuch as, in nearly all of the viral late genes, the 
ATG initiator codon is preceded at the DNA level by a T in 
position —3 (218, 276); we now know, however, that a com- 
plicated transcriptional mechanism replaces the undesirable 
T with the preferred A in position —3 (190, 226). I still can- 
not explain the translation of picornaviruses, cauliflower mo- 
saic virus (43), or the putative bifunctional mRNAs from 
Epstein Barr virus (274) and cottontail rabbit papillomavirus 



(7). In the last two cases, however, the transcription patterns 
are complex and it seems possible that the downstream pro- 
tein is translated, not from the 3'-end of the recognized bi- 
cistronic mRNA, but from a scarce monocistronic mRNA 
that has not yet been detected. One way to rationalize the pe- 
culiar leader sequences on retrovirus and poliovirus mRNAs 
is to suppose that feature^ that compromise translation are 
tolerated because those very features are required to replicate 
(33) and package the viral genome (1). Moreover, viruses can 
and do compensate for inefficient translation by using their 
efficient transcription signals to flood the cell with mRNA 
(125). As for the structure of cellular genes, I have argued 
herein that the presence of numerous upstream AUG codons 
in a cDNA sequence constitutes a strong hint that the cDNA 
might represent an intron-containing pre-mRNA, rather than 
the funtional mRNA from the gene in question. The growing 
number of such sequences in the vertebrate cDNA catalogue 
raises the interesting possibility that the final, regulated step 
in the expression of many critical genes is the conversion of 
a stable, untranslatable precursor to a functional mRNA. 
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